Istituto Di Matematica Computazionale

نویسندگان

  • Paolo Ferragina
  • Giovanni Manzini
چکیده

The size of electronic data is currently growing at a faster rate than computer memory and disk storage capacities. For this reason compression appears always as an attractive choice, if not mandatory. However space overhead is not the only resource to be optimized when managing large data collections; in fact data turn out to be useful only when properly indexed to support search operations that efficiently extract the user-requested information. Approaches to combine compression and indexing techniques are nowadays receiving more and more attention. A first step towards the design of a compressed full-text index achieving guaranteed performance in the worst case has been recently done in [10]. The novelty of that index resides in the careful combination of the compression algorithm proposed by Burrows and Wheeler [6] with the suffix array data structure [16]. The index is opportunistic in that, although no assumption on a particular fixed distribution is made, it takes advantage of the compressibility of the input data by decreasing the space occupancy at no significant asymptotic slowdown in the query performance. In this paper we present an implementation of this index and perform an extensive set of experiments on various text collections. These experiments allow us to highlight properties and drawbacks of the proposed solution, as well as identify some interesting scenarios where this novel index may find effective application. ∗Dipartimento di Informatica, Università di Pisa, Italy. E-mail: [email protected]. Supported in part by Italian MURST project “Algorithms for Large Data Sets: Science and Engineering” and by UNESCO grant UVO-ROSTE 875.631.9. †Dipartimento di Scienze e Tecnologie Avanzate, Università del Piemonte Orientale, Alessandria, Italy and IMC-CNR, Pisa, Italy. E-mail: [email protected]. Supported in part by MURST 60% funds.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On power series solutions for the Euler equation, and the Behr-Nečas-Wu initial datum

Carlo Morosi , Mario Pernici , Livio Pizzocchero c () a Dipartimento di Matematica, Politecnico di Milano, P.za L. da Vinci 32, I-20133 Milano, Italy e–mail: [email protected] b Istituto Nazionale di Fisica Nucleare, Sezione di Milano, Via Celoria 16, I-20133 Milano, Italy e–mail: [email protected] c Dipartimento di Matematica, Università di Milano Via C. Saldini 50, I-20133 Milano,...

متن کامل

Mass lesion detection in mammographic images using Haralik textural features

Istituto Nazionale di Fisica Nucleare (INFN)-Bari, Italy Universitá and INFN di Bari, and Center of Innovative Technologies for Signal Detection and Processing, Italy Dipartimento di Fisica, Università di Siena and INFN-Cagliari, Italy Struttura Dipartimentale di Matematica e Fisica, Università di Sassari and INFN-Cagliari, Italy Istituto Nazionale di Fisica Nucleare-Torino, Italy Dipartimento ...

متن کامل

Generation of a minimal set of templates in MR neuroimages

Dipartimento di Matematica e Fisica “E. De Giorgi”, Università del Salento, Italy Istituto Nazionale di Fisica Nucleare sez. di Lecce, Lecce, Italy Dipartimento di Fisica, Università di Bari, Italy Istituto Nazionale di Fisica Nucleare sez. di Bari, Genova, Italy Dipartimento di Fisica, Università di Genova, Italy f Istituto Nazionale di Fisica Nucleare sez. di Genova, Genova, Italy Dipartiment...

متن کامل

Hourglass stabilization and the virtual element method

1 Department of Mathematics, University of Leicester, University Road – Leicester LE1 7RH, UK 2 Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM 87545, USA 3 Istituto di Matematica Applicata e Tecnologie Informatiche del CNR, via Ferrata 1, 27100 Pavia, Italy, 4 Centro per la Simulazione Numerica Avanzata, Istituto Universitario di Studi Superiori, 27100 Pavia, Italy 5 Dipar...

متن کامل

Memory beyond memory in heart beating: an efficient way to detect pathological conditions

P. Allegrini, P. Grigolini, P. Hamilton, L. Palatella, G. Raffaelli 1 Istituto di Linguistica Computazionale del Consiglio Nazionale delle Ricerche, Area della Ricerca di Pisa-S. Cataldo, Via Moruzzi 1, 56124, Ghezzano-Pisa, Italy Center for Nonlinear Science, University of North Texas, P.O. Box 311427, Denton, Texas, 76203-1427 Dipartimento di Fisica dell’Università di Pisa and INFM Piazza Tor...

متن کامل

Analysis - suitable T - splines of arbitrary degree : definition and properties

L. Beirão da Veiga∗, A. Buffa†, G. Sangalli‡, R. Vázquez† ∗Dipartimento di Matematica Università di Milano Via Saldini 50, 20133 Milano, Italy. [email protected] †Istituto di Matematica Applicata e Tecnologie Informatiche Centro Nazionale delle Ricerche Via Ferrata 1, 27100 Pavia, Italy. [email protected] [email protected] ‡Dipartimento di Matematica Università di Pavia Via ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000